Story Segmentation and Topic Detection in the BroadcastNews DomainS
نویسندگان
چکیده
In this paper we present algorithms for story segmentation and topic detection. Both algorithms are online algorithms and use a combination of machine learning, statistical natural language processing and information retrieval techniques. The story segmentation algorithm is a two stage algorithm that uses a decision tree based probabilistic model in the rst stage and incorporates aspects of our detection system via an information-retrieval based reenement scheme in the second stage. The topic detection algorithm is an incremental clustering algorithm that employs a novel dynamic cluster-dependent similarity measure between documents and clusters. Cseg and topic-weighted Cdet for these algorithms on the 1998 TDT2 Evaluation are 0.1651 and 0.0042.
منابع مشابه
Story segmentation and topic detection for recognized speech
In this paper we present algorithms for story segmentation, topic detection, and topic tracking. The algorithms use a combination of machine learning, statistical natural language processing and information retrieval techniques. The story segmentation algorithm is a two stage algorithm that uses a decision tree based probabilistic model in the rst stage and incorporates aspects of our topic det...
متن کاملA Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling
In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...
متن کاملTwo-stage Story Segmentation and Detection on Broadcast News Using Genetic Algorithm
This paper proposes a two-stage story segmentation and detection approach on Mandarin broadcast news. In the two-stage paradigm, a topic classifier is first constructed to find the topic on the broadcast news within a sliding window and determine the potential story boundaries. Then, the problem for story segmentation is transformed to the determination of a chromosome (number sequence) in a se...
متن کاملA Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling
In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...
متن کاملTraffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کامل